Before working with the package rjags you will need to separately install JAGS.
In this example, we build a linear regression model, and include to finish, add a random effect. Models are run in JAGS.
library(rjags)
Loading required package: coda
Linked to JAGS 4.3.0
Loaded modules: basemod,bugs
# load lme4 only to access the dataset
# We could just address it long hand via lme4::sleepstudy but im being lazy.
library(lme4)
Loading required package: Matrix
library(ggplot2)
We will use the ‘sleepstudy’ dataset from the package lme4 which matches the simple linear effects model with a random effect in the example for the general linear mixed effects fitting function ?lme4::lmer.
From the help file associated with ?sleepstudy: The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject.
Ultimately we might fit a mixed effects model comprising a linear effect of the variable Days on the average Reaction time, with a random effect for each Subject but we will build toward this gradually. While it might be tempting to start with a linear regression and add in the random effect afterwards, I think it is conceptually easier to understand the random effect if we start by building a variance components model which is essentially a hierarchical model with a grand mean and a set of nested variances. We can then easily add in the linear part afterwards.
But first… lets build the basic model that is just a single grand mean and a single error term. We might call this our Null Model since it really is as simple as we can get and if we cant improve on this model then we cant really say anything about what is affecting reaction times other than it has a number and everyone varies around that number randomly.
This model basically comprises a mean and a standard deviation and so we could visualise our data a priori as either a histogram, or as a boxplot, or as an errorbar plot.
# create a histogram of the Reacion time data, and add some extra
# white space above to allow us add an error bar plot to represent the
# mean and standard deviation.
hist(sleepstudy$Reaction, breaks = 10, ylim = c(0, 35))
# add a point and text for the mean
mu <- mean(sleepstudy$Reaction)
points(mu, 30, pch = 20)
text(mu, 33, labels = round(mu))
# add a horizontal line for the +/- sd
ss <- sd(sleepstudy$Reaction)
lines(c(mu - ss, mu + ss), c(30, 30), col = "black")
First we define our model as a string for JAGS.
# Define our model
modelstring <- '
model {
# Likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i], tau)
mu[i] <- b0
}
# ---------------------------------------
# Priors
# prior on the intercept / grand mean
b0 ~ dnorm(0, 100 ^ -2)
# prior on the standard deviation of the error term
# which is then converted to precision.
sigma ~ dunif(0, 100)
tau <- 1 / (sigma ^ 2)
}
'
Now we need to bundle up our data and pass it to rjags for fitting.
# Set up data
data = list( N = nrow(sleepstudy),
Y = sleepstudy$Reaction)
# open a text connection for our modelstring. This has the effect of
# creating a file that doesnt exist on our drives as we consider a *.txt file
# to exist, but essentially the computer then treats them the same.
model_connection <- textConnection(modelstring)
# Run jags
model_null <- jags.model(model_connection, data = data,
n.chains = 3, quiet = TRUE)
# we should then close our text connection that we made
close(model_connection)
output <- coda.samples(model=model_null,
variable.names=c("b0", "sigma"),
n.iter = 5000,
thin = 5)
|
| | 0%
|
|* | 2%
|
|** | 4%
|
|*** | 6%
|
|**** | 8%
|
|***** | 10%
|
|****** | 12%
|
|******* | 14%
|
|******** | 16%
|
|********* | 18%
|
|********** | 20%
|
|*********** | 22%
|
|************ | 24%
|
|************* | 26%
|
|************** | 28%
|
|*************** | 30%
|
|**************** | 32%
|
|***************** | 34%
|
|****************** | 36%
|
|******************* | 38%
|
|******************** | 40%
|
|********************* | 42%
|
|********************** | 44%
|
|*********************** | 46%
|
|************************ | 48%
|
|************************* | 50%
|
|************************** | 52%
|
|*************************** | 54%
|
|**************************** | 56%
|
|***************************** | 58%
|
|****************************** | 60%
|
|******************************* | 62%
|
|******************************** | 64%
|
|********************************* | 66%
|
|********************************** | 68%
|
|*********************************** | 70%
|
|************************************ | 72%
|
|************************************* | 74%
|
|************************************** | 76%
|
|*************************************** | 78%
|
|**************************************** | 80%
|
|***************************************** | 82%
|
|****************************************** | 84%
|
|******************************************* | 86%
|
|******************************************** | 88%
|
|********************************************* | 90%
|
|********************************************** | 92%
|
|*********************************************** | 94%
|
|************************************************ | 96%
|
|************************************************* | 98%
|
|**************************************************| 100%
Tasks: check that the predictions from our Bayesian model match the simple maximum likelihood calculations of the mean and standard deviation of the reaction times.
We may well recognise that there is variation both within and among individual Subjects. For one, identifying where the majority of this variation is would be of interest in terms of directing future efforts either to finding explanatory variables within or among individuals. From a statistical perspective, it is vital to control for this variation when testing among individual effects, as otherwise our calculations of power and hence effect size and significance can be (very far) off.
# create a boxplot of the Reaction time data broken down by Subject.
# Add horizontal lines for teh grand mean (mu) and grand standard deviation (ss)
# which we created earlier in our first plot of the null model.
g1 <- ggplot(sleepstudy, aes(x = Subject, y = Reaction)) +
geom_boxplot() +
geom_abline(mapping = aes(intercept = mu, slope = 0), col = "red") +
geom_abline(mapping = aes(intercept = mu - ss, slope = 0),
col = "red", linetype = 2) +
geom_abline(mapping = aes(intercept = mu + ss, slope = 0),
col = "red", linetype = 2)
# print this figure to screen
print(g1)
Clearly there is considerable variation within subjects, with some individuals being very much lower than the overall mean, and others much higher. There appears at least by eye, for there to be more variation among individuals than within, where reaction times by individual seem to be reasonably consistent with a few exceptions. Bear in mind there is additional data in the form of a linear effect variable of the experimentally manipulated amount of sleep each subject received prior to testing, and this may help further explain the variation we observe.
The random effect in this example is by Subject, with multiple observations made on the same subject. In this simple random intercept model, we want to allow individuals to differ from their expected value (the intercept) by a (hopefully relatively) small deviation. The mean of each individuals’ deviations will be described by a normal distribution, and by a variable \(U[j]\) for the jth individual described by \(U_j \sim N(0, \sigma^2_U)\).
We define a new jags model
# Define our model
modelstring <- '
model {
# Likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i], tau)
# The code U[Subject[i]] looks up the subject identifier for the ith
# individual and extracts the corresponding deviation from U which
# is defined after the loop over i.
mu[i] <- b0 + U[Subject[i]]
}
# random parts
#
# Pull out the random deviations for the variable U, one for each
# subject
for (j in 1:S){
U[j] ~ dnorm(0, tau_U)
}
# we can if we want calcluate the total variance, which is the simple
# sum of variances. We would need to monitor this variable if we want to
# see it. Remember: while variances are additive, neither
# standard deviations nor precisions are.
var_tot <- (sigma_U ^ 2) + (sigma ^ 2)
# ------------------------------------------
# Priors
# prior on the grand mean b0 (same as the intercept)
b0 ~ dnorm(0, 100 ^ -2)
# prior on the residual error termn
sigma ~ dunif(0, 100)
tau <- 1 / (sigma ^ 2)
# prior on the Subject level error term
sigma_U ~ dunif(0, 100)
tau_U <- 1 / (sigma_U ^ 2)
# ------------------------------------------
}
'
Now pass on the additional data which is \(S\) the number of subjects in the dataset, and \(Subject\) which is the column Subject in the dataset.
The only thing we need to do is convert this factor labelled column which starts at subject 308 and runs to 372, into a sequence starting at 1 and running to 18: this is acheived easily by using as.numeric(sleepstudy$Subject).
# Set up data
data <- list( N = nrow(sleepstudy),
S = length(levels(sleepstudy$Subject)),
Subject = as.numeric(sleepstudy$Subject),
Y = sleepstudy$Reaction)
# open a text connection for our modelstring.
model_connection <- textConnection(modelstring)
# Run jags
model_random <- jags.model(model_connection, data = data,
n.chains = 3, quiet = TRUE)
# we should then close our text connection that we made
close(model_connection)
output <- coda.samples(model = model_random,
variable.names=c("b0", "sigma", "sigma_U"),
n.iter = 5000,
thin = 5)
|
| | 0%
|
|* | 2%
|
|** | 4%
|
|*** | 6%
|
|**** | 8%
|
|***** | 10%
|
|****** | 12%
|
|******* | 14%
|
|******** | 16%
|
|********* | 18%
|
|********** | 20%
|
|*********** | 22%
|
|************ | 24%
|
|************* | 26%
|
|************** | 28%
|
|*************** | 30%
|
|**************** | 32%
|
|***************** | 34%
|
|****************** | 36%
|
|******************* | 38%
|
|******************** | 40%
|
|********************* | 42%
|
|********************** | 44%
|
|*********************** | 46%
|
|************************ | 48%
|
|************************* | 50%
|
|************************** | 52%
|
|*************************** | 54%
|
|**************************** | 56%
|
|***************************** | 58%
|
|****************************** | 60%
|
|******************************* | 62%
|
|******************************** | 64%
|
|********************************* | 66%
|
|********************************** | 68%
|
|*********************************** | 70%
|
|************************************ | 72%
|
|************************************* | 74%
|
|************************************** | 76%
|
|*************************************** | 78%
|
|**************************************** | 80%
|
|***************************************** | 82%
|
|****************************************** | 84%
|
|******************************************* | 86%
|
|******************************************** | 88%
|
|********************************************* | 90%
|
|********************************************** | 92%
|
|*********************************************** | 94%
|
|************************************************ | 96%
|
|************************************************* | 98%
|
|**************************************************| 100%
And plot the output
# Plot output
plot(output)
Tasks: We could if we wanted, monitor the variable U which is calculated in the jags model we specified. This would give us the subject level effects around the intercept (which in this type of model is the global mean). If we do this, what additional information do we now get in the output object? What might we do to condense this information and make sense of it?
Adding a linear effect of Days of sleep deprivation now creates what many might call a mixed effects model. Really we are just extending our model within the framework of General Linear Models and personally I find much of the nomenclature around specific subsets of models to be unhelful (and I include variance components in this).
We define a new jags model that adds a linear effect of Days: \(b_1 * \text{Days}\) and we remember to specify a prior for this unknown parameter \(b_1\).
# Define our model
modelstring <- '
model {
# Likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i], tau)
# The code U[Subject[i]] looks up the subject identifier for the ith
# individual and extracts the corresponding deviation from U which
# is defined after the loop over i.
mu[i] <- b0 + b1 * X[i] + U[Subject[i]]
}
# random parts
#
# Pull out the random deviations for the variable U, one for each
# subject
for (j in 1:S){
U[j] ~ dnorm(0, tau_U)
}
# ------------------------------------------
# Priors
# prior on the grand mean b0 (same as the intercept)
b0 ~ dnorm(0, 100 ^ -2)
# prior on the slope of the effect of Days
b1 ~ dnorm(0, 100 ^ -2)
# prior on the residual error termn
sigma ~ dunif(0, 100)
tau <- 1 / (sigma ^ 2)
# prior on the Subject level error term
sigma_U ~ dunif(0, 100)
tau_U <- 1 / (sigma_U ^ 2)
# ------------------------------------------
}
'
# Set up data
data <- list( N = nrow(sleepstudy),
S = length(levels(sleepstudy$Subject)),
Subject = as.numeric(sleepstudy$Subject),
X = sleepstudy$Days,
Y = sleepstudy$Reaction)
# open a text connection for our modelstring.
model_connection <- textConnection(modelstring)
# Run jags
model_rand_int <- jags.model(model_connection, data = data,
n.chains = 3, quiet = TRUE)
# we should then close our text connection that we made
close(model_connection)
output <- coda.samples(model=model_rand_int,
variable.names=c("b0", "b1", "sigma", "sigma_U"),
n.iter = 5000,
thin = 5)
|
| | 0%
|
|* | 2%
|
|** | 4%
|
|*** | 6%
|
|**** | 8%
|
|***** | 10%
|
|****** | 12%
|
|******* | 14%
|
|******** | 16%
|
|********* | 18%
|
|********** | 20%
|
|*********** | 22%
|
|************ | 24%
|
|************* | 26%
|
|************** | 28%
|
|*************** | 30%
|
|**************** | 32%
|
|***************** | 34%
|
|****************** | 36%
|
|******************* | 38%
|
|******************** | 40%
|
|********************* | 42%
|
|********************** | 44%
|
|*********************** | 46%
|
|************************ | 48%
|
|************************* | 50%
|
|************************** | 52%
|
|*************************** | 54%
|
|**************************** | 56%
|
|***************************** | 58%
|
|****************************** | 60%
|
|******************************* | 62%
|
|******************************** | 64%
|
|********************************* | 66%
|
|********************************** | 68%
|
|*********************************** | 70%
|
|************************************ | 72%
|
|************************************* | 74%
|
|************************************** | 76%
|
|*************************************** | 78%
|
|**************************************** | 80%
|
|***************************************** | 82%
|
|****************************************** | 84%
|
|******************************************* | 86%
|
|******************************************** | 88%
|
|********************************************* | 90%
|
|********************************************** | 92%
|
|*********************************************** | 94%
|
|************************************************ | 96%
|
|************************************************* | 98%
|
|**************************************************| 100%
And plot and explore the output
# summaries of the posteriors
summary(output)
Iterations = 1005:6000
Thinning interval = 5
Number of chains = 3
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
b0 247.39 10.7191 0.19570 0.54797
b1 10.59 0.8035 0.01467 0.01723
sigma 31.25 1.7888 0.03266 0.03538
sigma_U 40.24 8.0724 0.14738 0.17029
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
b0 226.173 240.52 247.62 254.45 268.06
b1 9.022 10.05 10.59 11.13 12.19
sigma 27.998 30.04 31.13 32.38 35.13
sigma_U 27.442 34.52 39.19 44.57 59.48
# Plot output
plot(output)
The model above is technically a random intercept model since only the intercepts for each Subject vary and each individual subject shares the same overall effect of days of sleep deprivation on their reaction time. Since we can build any model we like in the Bayesian framework we can reasonably easily extend our model to allow both random intercepts and random slopes by individual.
# plot Reaction time by Days of sleep deprivation and colour by Subject.
# use geom_smooth() to add linear estimates for each subject, and
# suppress the error bars else the plot gets confusing
g3 <- ggplot(sleepstudy, aes(x = Days, y = Reaction, col = Subject)) +
geom_point() +
geom_smooth(method = lm, se = FALSE)
# print this figure to screen
print(g3)
Task: Before we start coding; how many slopes do we need to esimate? i.e. how many js in \(b_{1}[j]\) do we need?
Since we will be drawing our \(b_1\)s from a normal distribution as we did for the intercepts, we will need to specify a prior for both the mean and the variance for this distribution.
We therefore now have three random terms: residual error, error among the subjects’ intercepts and error among the subjects’ response to sleep deprivation.
# Define our model
modelstring <- '
model {
# Likelihood
for (i in 1:N){
Y[i] ~ dnorm(mu[i], tau)
# The code U[Subject[i]] looks up the subject identifier for the ith
# individual and extracts the corresponding deviation from U which
# is defined after the loop over i.
mu[i] <- b0 + b1[Subject[i]] * X[i] + U[Subject[i]]
}
# random parts
#
# Pull out the random deviations for the variable U, one for each
# subject.
# We can also draw the slopes for our individual subjects
for (j in 1:S){
U[j] ~ dnorm(0, tau_U)
b1[j] ~ dnorm(mu_b1, tau_b1)
}
# ------------------------------------------
# Priors
# prior on the grand mean b0 (same as the intercept)
b0 ~ dnorm(0, 100 ^ -2)
# prior on the mean and variabnce of slope of the effect of Days
mu_b1 ~ dnorm(0, 100 ^ -2)
sigma_b1 ~ dunif(0, 100)
tau_b1 <- 2 / (sigma_b1 ^ 2)
# prior on the residual error termn
sigma ~ dunif(0, 100)
tau <- 1 / (sigma ^ 2)
# prior on the Subject level error term
sigma_U ~ dunif(0, 100)
tau_U <- 1 / (sigma_U ^ 2)
# ------------------------------------------
}
'
# Set up data
data <- list( N = nrow(sleepstudy),
S = length(levels(sleepstudy$Subject)),
Subject = as.numeric(sleepstudy$Subject),
X = sleepstudy$Days,
Y = sleepstudy$Reaction)
# open a text connection for our modelstring.
model_connection <- textConnection(modelstring)
# Run jags
model_rand_slopes <- jags.model(model_connection, data = data,
n.chains = 3, quiet = TRUE)
# we should then close our text connection that we made
close(model_connection)
output <- coda.samples(model = model_rand_slopes,
variable.names=c("b0", "mu_b1", "sigma_b1",
"sigma", "sigma_U"),
n.iter = 5000,
thin = 5)
|
| | 0%
|
|* | 2%
|
|** | 4%
|
|*** | 6%
|
|**** | 8%
|
|***** | 10%
|
|****** | 12%
|
|******* | 14%
|
|******** | 16%
|
|********* | 18%
|
|********** | 20%
|
|*********** | 22%
|
|************ | 24%
|
|************* | 26%
|
|************** | 28%
|
|*************** | 30%
|
|**************** | 32%
|
|***************** | 34%
|
|****************** | 36%
|
|******************* | 38%
|
|******************** | 40%
|
|********************* | 42%
|
|********************** | 44%
|
|*********************** | 46%
|
|************************ | 48%
|
|************************* | 50%
|
|************************** | 52%
|
|*************************** | 54%
|
|**************************** | 56%
|
|***************************** | 58%
|
|****************************** | 60%
|
|******************************* | 62%
|
|******************************** | 64%
|
|********************************* | 66%
|
|********************************** | 68%
|
|*********************************** | 70%
|
|************************************ | 72%
|
|************************************* | 74%
|
|************************************** | 76%
|
|*************************************** | 78%
|
|**************************************** | 80%
|
|***************************************** | 82%
|
|****************************************** | 84%
|
|******************************************* | 86%
|
|******************************************** | 88%
|
|********************************************* | 90%
|
|********************************************** | 92%
|
|*********************************************** | 94%
|
|************************************************ | 96%
|
|************************************************* | 98%
|
|**************************************************| 100%
And plot the output
# summaries of the posteriors
summary(output)
Iterations = 1005:6000
Thinning interval = 5
Number of chains = 3
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
b0 249.965 7.449 0.13600 0.31129
mu_b1 10.523 1.742 0.03180 0.03059
sigma 25.779 1.531 0.02796 0.02796
sigma_U 27.225 6.670 0.12177 0.15454
sigma_b1 9.189 2.007 0.03664 0.04117
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
b0 235.139 245.301 250.023 254.70 264.74
mu_b1 7.172 9.390 10.516 11.61 14.06
sigma 23.109 24.711 25.662 26.71 29.15
sigma_U 16.258 22.564 26.443 30.93 42.85
sigma_b1 6.027 7.785 8.946 10.30 13.77
# Plot output
plot(output)
Tasks: as before, we might want to extract the actual slopes for each individual. What parameter would we add to monitor list to acheive this? Do these estimates match the visualisation of the data we made above?
Bayesian models have DIC as an analogue to the maximum likelihood founded AIC scores, and they work in much the same way: lower scores indicate improved model fit balanced against the number of parameters involved. We can use dic.samples() to quickly caluclate these scores for our four models. Just bear in mind that as with frequentist or maximum likelihood approaches, calculation of the number of parameters in random effects models can be complicated, and in many instances dic.samples() may not run or be reliable.
I have opted for just 1000 samples on which to calculate DIC, but you may want to increase this.
# first the null model
dic.samples(model_null)
# then our two level random effects model
dic.samples(model_random)
# now you can do the same here for the random intercept an random slopes models.
Task: modify the R chunk above and use DIC of the four models to decide whic model is most appropriate to the data.